Error-Correcting Codes That Nearly Saturate Shannon's Bound 
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Gallager-type error-correcting codes that nearly saturate Sliannon's bound are constructed using 
insight gained from mapping the problem onto that of an Ising spin system. The performance of 
the suggested codes is evaluated for different code rates in both finite and infinite message length. 
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Efficient information transmission plays a central role 
in modern society, taking a variety of forms, from tele- 
phone and satellite communication to storing and retriev- 
ing information on disk-drives. Error-correcting codes 
are commonly used in most methods of information 
transmission to compensate for noise corrupting the data 
during transmission; they require the use of additional in- 
formation transmitted together with the data itself. The 
percentage of informative transmitted bits, determines 
the coding efficiency and subsequentally the speed of 
communication channels and the effective storage space 
on hard-disks. In his seminal paper of 1948, Shannon ||] 
derived the channel capacity, providing bounds on the 
code-rate for which codes, capable of achieving perfect 
retrieval for a given noise level, can be found. The search 
for efficient, practical error-correcting codes that satu- 
rate Shannon's bound resulted in several practical codes, 
most of which are still below Shannon's bound. Here we 
propose a new approach based on insight gained from the 
study of Ising spin-systems with low-connectivity multi- 
spin interactions. Adapting our method to Gallager's 
error-correcting codes B one obtains codes that nearly 
saturate the limits set by Shannon. 

In a typical scenario, a message comprising N binary 
bits is transmitted through a noisy communication chan- 
nel; the received string differs from the transmitted one 
due to noise which may flip some bits. We identify the 
flipping rate - / e [0 : 1] - in a binary symmetric chan- 
nel as the fraction of bits that change their value from 
to 1 or from 1 to 0. We focus on this noise model 
as it can be easily interpreted within the framework of 
Ising spin systems; however, other noise types may also 
be considered, and may be more realistic in some sce- 
narios. The receiver can correct the flipped bits only if 
the source transmits M{f) > N bits; the ratio between 
the original number of bits and those of the transmitted 
message R = N/M constitutes the code-rate for unbi- 
ased messages. Shannon |l| derived the channel capacity 
and provided bounds on the maximal code rate Re, for 
a given flip rate / and code bit error probability pi,, for 
which codes, capable of achieving perfect retrieval, exist. 
The maximal code rate equals the channel capacity and 
is given explicitly H by 



Rc=il-H2{f))/{l-H2ipb)) , 



(1) 



where H2{x) = a;log2(a:) + (1 — 2:)log2(l — x). 

Shannon's theory is unconstructive, and the many 
good algorithms that have been introduced over the years 
(e.g., BCH, Reed-Muller and Reed-Solomon codes, for a 
review see Q) fall short of saturating Shannon's bounds, 
although they may provide close-to-optimal performance 
in speciflc scenarios. Even the most advanced code to 
date, the Turbo code ||] is somewhat below Shannon's 
bound. 

One error-correcting code which recently became pop- 
ular is the Gallager code |^J6|H], which was abandoned 
shortly after its introduction due to the limited compu- 
tational abilities of the time. In this method, represent- 
ing a special case of parity-check codes, the transmitted 
message comprises the original message itself and addi- 
tional bits, each of which is derived from the parity of 
a sum of certain message-vector bits. The choice of the 
message- vector elements used for generating single code- 
word bits is carried out according to a predetermined 
random set-up and may be represented by a product of 
a randomly generated sparse matrix and the message- 
vector in a manner explained below. Decoding the re- 
ceived message relies on iterative probabilistic methods 
like belief propagation |^,|| . 

It has been shown that by using Gallager-type methods 
and specific choices of the encoding/decoding matrix it 
is possible to improve the maximal practically achievable 
code-rate ^,|| although results are still somewhat below 
Shannon's capacity. The root of the problem is the in- 
evitable tradeoff between improving the code's corrective 
capabilities and the need for a practical and reliable iter- 
ative decoding process, guaranteed to converge from any 
initial condition (i.e., that will not require additional, 
typically unavailable, information about the message it- 
self). This goal is achieved by understanding the physical 
characteristics of the problem and devising a new method 
based on this insight. As Gallager-type methods form the 
basis of our proposal we will now explain explicitly the 
version we employ - the MN code Q. 

In the MN code one constructs two sparse matrices 
A and B of dimensionalities M x N and M xM respec- 
tively. The matrix A has K non-zero (unit) elements 
per row and C{— KM/N) per column while B has L per 
row/column. The matrix B~^A is then used for encoding 
the message 



t = B-'^A s (mod 2) . 

The received message comprises the transmitted vector 
corrupted by the noise vector n: r = t + n (mod 2) . 
Decoding is carried out by employing the matrix B to 
obtain: z = B (t + n) = As + Bn , and requires solving 
the equation 



[A,B] 



n 



where s' and n' are the unknowns. This may be carried 
out using methods of belief network decoding f^^ , where 
pseudo-posterior probabilities, for the decoded message 
bits being or 1, are calculated by solving iteratively a 
set of equations for the conditional probabilities of the 
codeword bits given the decoded message and vice versa. 
For exact details of the method used and the equation 
themselves see [g[. 

Most studies of Gallager-type codes have been carried 
out via methods of information theory (e.g., |^). The 
first link between a special case of Gallager's method, 
where B — I the identity matrix, and the realm of physi- 
cal spin-systems was established by Sourlas [nOJ by map- 
ping the problem onto that of a Hamiltonian system, 
replacing the original Boolean variables by binary ones 
which are analogous to spins in Ising-type systems with 
Multi-Spin Interactions (MSI). For this simple case the 
system is described by the Hamiltonian 



H = - 



E 



J, 



%\,12,---,IK 



(2) 
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where {s^} are the binary dynamical variables (±1), 
used in the decoding process. The tensor Ji^^i^,...,!^ = 
±SijSi2...Si^ with probabilities 1 — / and / correspond- 
ingly, represents the received codeword corrupted by 
noise during transmission, § being the binary representa- 
tion of the original Boolean message vector s; the choice 
of indices ii, Z2, ...,?_r- corresponds to the non-zero row 
elements of the matrix A. Under a gauge transformation 
this model is mapped onto an Ising spin system with fer- 
romagnetic bias; finding the ground state of the Hamil- 
tonian is closely related to finding the Bayes optimal 
posterior under a certain noise level [|lO| . This mapping 
onto Hamiltonian spin-systems, suggested by Sourlas for 
highly connected systems, was recently extended to par- 
ticular forms of sparse matrices A (where -B = /) as well 
as to certain B matrices |0]. In this extended frame- 
work, K and L represent the number of MSI among the 
signal and noise components respectively. 

Our method uses the same structure as the MN codes 
and builds on insight gained from the study of physi- 
cal systems with symmetric and asymmetric fl^ l MSI 
and from examining a special case of Gallager's method 
|QG,nl|. These theoretical studies indicate that one may 
obtain superior capabilities, in terms of the achievable 



code rate, by choosing high K and L values; however, 
they come at the expense of poor decoding performance 
as the corresponding basins of attraction shrink rapidly 
with the increasing K and L values, making it essential 
to have high initial overlap between the original mes- 
sage and the dynamical variables for the iterative decod- 
ing process to converge successfully. Such information 
is clearly unavailable in practical scenarios. One should 
emphasise that the basin of attraction shrinks dramati- 
cally. In the system suggested by Sourlas, for instance, 
the initial overlap (magnetisation in the physical system) 

TO = 1/A^ X]i=i(2si — l)(2si — 1) required in the case of 
K — 6 should be higher than 0.99 for a successful con- 
vergence; this has been shown by numerical simulations 
as well as by a mean-field calculation to be presented 
elsewhere. On the other hand, highly robust iterative 
decoding is obtained for low K and L values at the ex- 
pense of sub-optimal capabilities (i.e., low end overlap). 

The method presented here is based on constructing 
the matrices A and i? in a manner that correspond to the 
gradual introduction of higher connectivity sparse sub- 
matrices, exploiting the excellent convergence properties 
of codes based on low K and L values with the superior 
performance of high-iiT codes. More specifically, one aims 
at starting with low MSI values, in this case K + L < 3, 
so as to bring the system to high overlap values from 
practically any initial condition; higher values of K and 
L, e.g. 3 < K + L < 5, may then be used for bringing 
the system to a perfect overlap between the decoded and 
the original word. 

The practical implementation of the encoding is similar 
to that of the MN code except that the composed ma- 
trix used, [^|B], comprises randomly chosen sparse sub- 
matrices of different connectivities. The generated code- 
word, constructed by taking the parity of sums of mes- 
sage vector bits selected according to the specific choice 
of A and B, is then transmitted through the noisy chan- 
nel. Decoding the corrupted codeword is carried out us- 
ing an iterative process identical to that of Ref. pj and 
can take two forms: a) A gradual introduction of higher 
connectivity sub-matrix components in the Hamiltonian 
system used for decoding following the above description, 
where end result at each stage serves as an initial con- 
dition for the next. This is equivalent, from a physical 
point of view, to changing the Hamiltonian as a function 
of time by gradually summing over more message bits in 
Eq.(||). b) Using the composed matrices, including a va- 
riety of sub-matrices with different connectivities, right 
from the start. The latter, which simply correspond to 
a particular construction of the matrices A and B in the 
MN code, has been used in most of our experiments due 
to its simplicity, although the former has shown faster 
convergence at high noise levels. In both cases the ex- 
plicit choice of sites for generating a specific code-word 
bit is carried out at random, in a similar fashion to most 



Gallager-type codes. 

The main question that should be addressed is the opti- 
mal choice of sub-matrix connectivities. There are many 
possibilities for choosing K and L values for the differ- 
ent stages and one should examine various possibilities 
before arriving at the optimal configuration. However, 
there are a few guidelines one should follow: 1) Initial 
stages are characterised by low K and L values; K val- 
ues are chosen gradually higher, so as to support the 
correction of faulty bits. 2) One should choose the num- 
ber of non-zero column elements as uniformly as possible, 
as the number of connections per bit (spin) defines the 
corrective input it receives (this is somewhat in contrast 
to the approach adopted for irregular Gallager codes in 
which column/row connectivity is taken from some dis- 
tribution I03). 3) As in most of these systems both 
solutions, with 77i = ±l, are equally attractive one should 
break the inversion symmetry. This may be achieved by 
adding some odd-MSI (i.e., an odd value for K + L) to 
the mainly even K+L value used initially; this assists in 
breaking the symmetry from any initialisation of the iter- 
ative equations p] with practically no effect on the basin 
of attraction. 4) To guarantee the inversion of the ma- 
trix i?, and since noise bits have no explicit correlation, 
we use a patterned structure, Bi,k = Si^k + Si,k-5, for the 
B-submatrices with L = 2 and Bi_k = Si.k for L=l. Other 
practical points as well as a more detailed explanation 
of the physical insight leading to the optimal choice of 
MSI connectivity and the relation to Sourlas's code will 
be presented elsewhere. 



R 


iV 


A 


K 


B 


L 


/f 


/r 


fc 


1/3 


10000 


NxN 
3/4 NxN 
5/4 NxN 


1 
3 
3 


Nx3N 
3/4: Nxm 
5/4:Nx3N 


2 
2 
1 


0.159 


0.169 
-0.170 


0.174 


1/4 


30000 


3/2 NxN 
N/2xN 
2NxN 


1 
3 
3 


3/2 iVx4iV 
N/2x4N 
2Nx4N 


2 
2 
1 


0.204 


0.210 
-0.211 


0.2145 


1/5 


36000 


3NxN 
2NxN 


1 
3 


3Nx5N 
2Nx5N 


2 
1 


0.235 


0.239 
-0.240 


0.2430 



TABLE I. The critical flip rates f^ and /f= obtained by 
employing our method for various code rates in comparison to 
the maximal flip rate fc provided by Shannon's bound. De- 
tails of the specific architectures used and their row/column 
connectivities are also provided. 

We conclude this presentation with a demonstration of 
the method's capabilities for three different code-rates 
i?= 1/3, 1/4 and 1/5. In each of the cases we divided the 
composed matrix [A|i?] to six sub- matrices characterised 
by specific K and L values as explained in table 1; the 
dimensionalities of the full A and B matrices are MxN 
and MxM respectively. Sub-matrix elements were chosen 
at random according to the guidelines mentioned above. 
Encoding was carried out straightforwardly by using the 
matrix B~^A and the corrupted messages were decoded 



using the set of recursive equations of Ref. Q , using ran- 
dom initial conditions. In each case, T blocks of A^-bit 
unbiased messages (where exactly 1/2 of the bits are 1) 
were sent through a noisy channel of flip rate / (i.e., an 
exact fraction / of the codeword bits were flipped); both 
bit and block error-rates, denoted ph and pb respectively, 
were monitored. We performed at least T = 10000 tri- 
als runs for the smaller systems (A^ = 10000, 12000) and 
T = 1000-2000 runs for the larger ones ( A^ = 30000, 36000) 
for each flip-rate value, starting from different initial con- 
ditions. These were averaged to obtain the mean bit 
error-rate and the corresponding variance. In most of 
our experiments we observed convergence after less than 
100 iterations, except very close to the critical flip rate. 
The main halting criterion we adopted relies on the sta- 
tionarity of the first A^ bits (i.e., the decoded message) 
over a certain number of iterations. The decoding algo- 
rithm's complexity is of 0{N) as all matrices are sparse. 
The inversion of the matrix B is carried out only once 
and requires 0{N^ log A^) operations. 
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FIG. 1. Bit-error rate pi, as a function of the flip rate 
for given code-rates R — 1/3, 1/4 and 1/5. Results for each 
code-rate appear as symbols adjacent to a line representing 
Shannon's theoretical bound; triangles and squares, represent 
mean values obtained for small and large network sizes respec- 
tively, corresponding to A^ = 10000 and 30000 for 7?= 1/3, 1/4, 
A= 12000 and 36000 for 7?= 1/5. Predicted code-rate values 
in the N^oo limit appear as arrows on the x axis. 

In table 1 we present the typical architectures used as 
well as the maximal fiip rate f^ for which not more than 
a single error-bit per block have been observed on average 
for a particular message length A^, the predicted maxi- 
mal fiip rate f^ once finite size effects have been consid- 
ered (discussed below) and Shannon's maximal fiip rate 
fc defined in Eq.dl^). In all these cases one obtains, on 
average, perfect retrieval for noise rates that almost sat- 
urate Shannon's bound for the critical flip rate. Just for 
comparison, the corresponding results reported in Ref. 
Q for regular and irregular Gallager codes (i? = 1/4), 
based on 10000 trials and A^ = 16000 report a critical 
value around / = 0.160 in comparison to /^ = 0.204 and 
/;?° = 0.210-0.211 reported here. 



Figure 1 shows results obtained for code-rates R = 
1/3, 1/4 and 1/5 and various flip rates; results for each 
one of the code-rates appear as symbols adjacent to a line 
representing Shannon's theoretical bound for the given 
code-rate and noise level. Triangles and squares, repre- 
sent mean values obtained for small and large network 
sizes respectively, corresponding to iV = 10000 and 30000 
for R = 1/3, 1/4 and N = 12000 and 36000 for R = 1/5; 
variances are smaller than the symbol size. One notes the 
existence of finite size effects, manifested in the difference 
between the results obtained for different system sizes. 
Predicted code-rate values in the N ^ oo limit, derived 
below, are represented as arrows on the x axis. The re- 
sults clearly show that in all the code-rates examined our 
method comes very close to saturating Shannon's bound. 
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FIG. 2. The block magnetisations profile for R — 1/5, 
/ — 0.236, 0.237 (solid and dashed lines respectively) and 
N = 12000, 36000, showing the sample magnetisation m vs. 
the fraction of the complete set of trials. A total of 1000-10000 
trials (for larger and smaller systems respectively) were rear- 
ranged in a descending order according to their magnetisation 
values (directly related to the overlap between the decoded 
and the original message) . The fraction of perfectly retrieved 
blocks increases with system size (thick lines). Inset - log- log 
plots of mean convergence times r for R — 1/3 and A'^ — 10000 
(A), R = 1/4, N = 10000 (D) and N = 30000 (O), R = 1/5 
and iV = 36000 (o). The f^ values were calculated by fit- 
ting expressions of the form r oc l/(/,?° — /) through the data 
(dashed lines for the larger systems). 

The results shown so far are based on finite- A^ simula- 
tion results. However, as Shannon's bound itself is based 
on infinitely large messages, one cannot expect to sat- 
urate the bound completely for finite-A'' messages. To 
assess the critical flip rate achievable by our method in 
the limit of infinitely large systems, f^ , we monitor two 
criticality indicators: a) The dependence of the block er- 
ror distribution on the system size - the transition from 
perfect (pb(/) = 1) to no retrieval (pb(/) =0), as a func- 
tion of the flip-rate /, is expected to become a step func- 
tion (at f^) as N ^ oo. If the percentage of perfectly 
retrieved blocks in the sample, for a given flip rate /, in- 
creases (decreases) with A^ one can deduce that / < f^ 



(or / > f^). b) Convergence times as a function of / 
- convergence times near criticality usually diverge as 
l/(/<?° ^ f)y by monitoring average convergence times 
for various / values and extrapolating one may deduce 
the corresponding critical flip rate. 

In Fig. 2 we ordered the samples obtained for R = 1/5, 
/ = 0.236, 0.237 (solid and dashed lines respectively) and 
A^ = 12000, 36000 according to their magnetisation; re- 
sults with higher magnetisation appear on the left and 
the X axis was normalised to represent fractions of the 
complete set of trials. One can easily see that the frac- 
tion of perfectly retrieved blocks increases with system 
size (thick lines) indicating that / < f^. Repeating 
the same exercise for higher / values we obtained an 
estimate of f^ reported in table 1. In the inset one 
finds log-log plots of the mean convergence times r for 
R= 1/3, 1/4, 1/5 and different A^ values, carried out on 
perfectly retrieved blocks with less than 2 error bits. The 
optimal fitting of expressions of the form t oc 1/(/^ — /) 
through the data provides another indication for the f^ 
values, which are consistent with those obtained by the 
first method. 

To conclude, we have shown that through a successive 
change in MSI and connectivity, while keeping the con- 
nectivity low (< 5), one can boost the performance of 
matrix based error-correcting codes, getting ever closer 
to saturating the theoretical bounds set by Shannon. It is 
quite plausible that the performance reported here may 
be improved upon by fine tuning the choice of architec- 
ture, which is currently under way. Moreover, it is highly 
likely that several architectures will provide similar per- 
formance in the thermodynamic limit; it would be worth- 
while to examine their finite size behaviour above and 
below saturation which is of great practical significance. 
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